A Talking Face Driven by Voice using Hidden Markov Model
نویسندگان
چکیده
In this paper, we utilized Hidden Markov Model (HMM) as a mapping mechanism between two different kinds of correlated signals. Specifically, we developed a voicedriven talking head system by exploiting the physical relationships between the shape of the mouth and the sound that is produced. The proposed system can be easily trained and a talking head can be efficiently animated. In the training phase, the Mel-scale Frequency Cepstral Coefficients (MFCC) were analyzed from audio signals and the Facial Animation Parameters (FAP) were extracted from video signals. Then both audio and video features were integrated to train a single HMM. In the synthesis phase, the HMM was used to correlate a completely novel audio track to a FAP sequence for face synthesis with the help of Facial Animation Engine (FAE). The experiments demonstrated the effects of the proposed voice-driven talking head on both man and woman, with two kinds of styles (speaking and singing) and using three kinds of languages (Chinese, English and Taiwanese). The possible applications of the proposed system are computer aided instruction, online guide, virtual conference, lip synchronization, human computer interaction and so on.
منابع مشابه
Real-time lip-synch face animation driven by human voice
In this demo, we present a technique for synthesizing the mouth movement from acoustic speech information. The algorithm maps the audio parameter set to the visual parameter set using the Gaussian Mixture Model and the Hidden Markov Model. With this technique, we can create smooth and realistic lip movements.
متن کاملText Driven 3D Photo-Realistic Talking Head
We propose a new 3D photo-realistic talking head with a personalized, photo realistic appearance. Different head motions and facial expressions can be freely controlled and rendered. It extends our prior, high-quality, 2D photo-realistic talking head to 3D. Around 20-minutes of audio-visual 2D video are first recorded with read prompted sentences spoken by a speaker. We use a 2D-to-3D reconstru...
متن کاملA new language independent, photo-realistic talking head driven by voice only
We propose a new photo-realistic, voice driven only (i.e. no linguistic info of the voice input is needed) talking head. The core of the new talking head is a context-dependent, multilayer, Deep Neural Network (DNN), which is discriminatively trained over hundreds of hours, speaker independent speech data. The trained DNN is then used to map acoustic speech input to 9,000 tied “senone” states p...
متن کاملEffect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face
A new integration method is presented to recognize the emotional expressions of human. We attempt to use both voices and facial expressions. For voices, we use such prosodic parameters as pitch signals, energy, and their derivatives, which are trained by Hidden Markov Model (HMM) for recognition. For facial expressions, we use feature parameters from thermal images in addition to visible images...
متن کاملAnimation of a Hierarchical Appearance Based Facial Model and Perceptual Analysis of Visual Speech
In this Thesis a hierarchical image-based 2D talking head model is presented, together with robust automatic and semi-automatic animation techniques, and a novel perceptual method for evaluating visual-speech based on the McGurk effect. The novelty of the hierarchical facial model stems from the fact that sub-facial areas are modelled individually. To produce a facial animation, animations for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 22 شماره
صفحات -
تاریخ انتشار 2006